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Abstract 

Simple sequence repeats (SSRs) have been widely used in maize genetics and breeding, because theyare co- 
dominant, easy to score, and highly abundant. In this study, we used whole-genome sequences from 16 maize 
inbredsand 1 wild relative to determine SSR abundance and to developaset of high-density polymorphic SSR 
markers. Atotal of 264 6 58 SSRs were identified across the 1 7 genomes, with an average of 1 35 693 SSRs per 
genome. Marker density was one SSReveryof 1 5.48 kb. (C/G)n, (AT)n, (CAG/CTG)n, and (AAAT/ATTT)h were 
the most frequent motifs for mono, di-, tri-, and tetra-nucleotide SSRs, respectively. SSRs were most abundant 
in intergenic region and least frequent in untranslated regions, as revealed by comparing SSR distributions of 
three representative resequenced genomes. Comparing SSR sequences and e-polymerase chain reaction ana- 
lysis among the 1 7 tested genomes created a new database, including 111 887 SSRs, that could be develop as 
polymorphic markers in sfV/co. Among these markers, 58.00, 26.09, 7.20, 3.00, 3.93, and 1.78% of them had 
mono, di-, tri-, tetra-, penta-, and hexa-nucleotide motifs, respectively. Polymorphic information content for 
35 573 polymorphic SSRs out of 1 1 1 887 loci varied from 0.05 to 0.83, with an average of 0.31 in the 1 7 
tested genomes. Experimental validation of polymorphic SSR markers showed that over 70% of the primer 
pairs could generate the target bands with length polymorphism, and these markers would be very powerful 
when they are used for genetic populations derived from various types of maize germplasms that were 
sampled for this study. 

Key words: simple sequence repeat; whole-genome sequences; polymorphic SSR markers; teosinte; maize 



1 . Introduction 

Maize (Zea mays L.) is one of the most important food, 
feed, and industrial crops globally and a model system 
for the study of genetics, evolution, and domestication. 



The authors contributed equally to this work. 



The maize genome is large and complex. The estimated 
total size of genome draft is 2.3 Gb, with over 80% of 
repeated sequences of various types. 1 The genetic vari- 
ability in the maize genome can be utilized to enhance 
biotic and abiotic stress tolerance and to improve agro- 
nomic traits such as quality, maturity, and yield poten- 
tial. Types of variation at the whole-genomic level 
include microsatellites or simple sequence repeats 
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(SSRs),single-nucleotide polymorphisms (SNPs), inser- 
tions and deletions (indels), and various types of struc- 
ture variation. 

SSRs are tandemly repeated mono-, di-, tri-, tetra-, 
penta-, and hexa-nucleotide sequence motifs flanked 
by unique sequences. 2,3 The unique sequences border- 
ing the SSR motifs provide templates for specific primers 
to amplify SSR alleles via polymerase chain reaction 
(PCR), and allelic differences are usually the result of 
variable numbers of repeat units within a microsatellite 
structure. 4 A larger number of repeated units is gener- 
ally related to greater genotypic variation, and the 
shorter motifs such as those with mono-, and di- 
nucleotides usually possess more repeats than longer 
motifs such as those with tetra-, penta-, and hexa- 
nucleotides. However, shorter motifs can produce 
more slipped-strand mispairing (stuttering) during 
PCR, which usually lead to genotyping errors. 5,6 Based 
on the repetitive architecture, purity, and complexity 
of their motifs, SSRs can be classified as perfect (single 
motif in an uninterrupted array), imperfect,orcompound 
(two or more motifs in interrupted or uninterrupted 
arrays). As we known, SSR loci with longer or perfect 
motifs can exhibit a higher level of allelic variability. 5 

SSRs have been the genetic markers of choice, 
because they are easy to score, and have multiallelic 
nature, co-dominant inheritance, and clear advantages 
over restriction fragment length polymorphism and 
amplified fragment length polymorphism markers in 
terms of technical simplicity, throughput level and 
automation. 7 Compared with SNP markers that are 
generally biallelic, 8 SSR markers are more informative 
because it can detect multiple alleles per locus, so 
they are still commonly used nowadays. 

Thanks to the availability of whole-genome or tran- 
scriptome sequences in public databases and in the 
recent advent of bio informatics tools, development of 
genetic markers including SSRs has become much 
easier and more cost-effective. Genetic markers can 
be obtained by screening genomic, cDNA sequences, 
or libraries of clones. To facilitate access to and utiliza- 
tion of SSR markers in Brachy podium, 27 329 SSR 
markers were successfully designed through genome- 
wide analysis, but only 398 SSR markers have been 
developed from its bacterial artificial chromosomes 
end and expressed sequence tag databases. 9 The 
availability of the completed soybean whole-genome se- 
quence also provided an ideal resource for the genome- 
wide development of locus-specific SSR markers, and 
33 065 high-polymorphic SSRs were developed with 
the availability of their genome positions and primer 
sequences. 1 0 Barchi etal. 1 1 combined the recently devel- 
oped a restriction site-associated DNA approach with 
lllumina DNA sequencing to rapidly discover a large 
number of SNP and SSR markers for eggplant. Huang 
et a\} 2 identified over 3.6 million SNPs by sequencing 



51 7 rice landraces, which were used in genome-wide as- 
sociation studies for 1 4 agronomic traits. These results 
show that genetic markers such as SSRs and SNPs are 
abundant in different crop genomes and can be easily 
scored, making it more accessible to the breeders and 
geneticists. 

SSRs are abundant and well distributed throughout 
the maize genome, which can be employed as a pre- 
ferred marker system. SSR markers have been utilized 
extensively in maize to characterize the genetic struc- 
ture and diversity, to construct phylogenetic trees and 
to define potential heterotic groups, and to identify 
unique sources of allelic diversity. 1 3-1 5 Furthermore, 
SSR markers have been widely used for genetic ma p con- 
struction, quantitative trait locus (QTL) mapping, map- 
based cloning, and marker-assisted selection (MAS) 
because of their ubiquity and high level of polymorph- 
ism. Hence, enriching the current maize linkage maps 
with more SSR markers is of great value for the global 
maize molecular breeding. 

In recent years, many SSR markers have been devel- 
oped and are publicly available (http://www.maizegdb. 
org/ssr.php) based on their target sequences among 
different maize germplasm accessions. However, a rela- 
tively low level of polymorphism was observed between 
cultivated maize and their relatives, and within pop- 
ulations derived from cultivated x teosinte and 
temperate x tropical maize crosses. The availability of 
the reference genome sequence and increasingly cost- 
effective sequencing facilities makes it possible to do 
whole-genome sequencing for more maize germplasm 
accessions. We used whole-genome sequence informa- 
tion from 3 typical tropical maize inbreds, 1 3 typical 
temperate maize inbreds from different heterotic 
groups,and 1 teosinte line, toanalysetheirgenetic vari- 
ation and to develop polymorphic molecular markers 
that can be used for high-resolution MAS, genomic 
selection, and QTL mapping. Using germplasm of 
diverse resources including teosinte and different 
types of maize lines, we can reveal and utilize unique 
alleles and loci better. Thus, the objectives of this 
study were to determine the abundance and character- 
ization of SSRs in the maize genome and to use stringent 
screening to develop highly polymorphic SSR markers. 

2. Materials and methods 

2.1. Plant materials 

Sequence data were generated for 1 7 genotypes in- 
cluding 1 6 improved maize inbred lines and 1 wild rela- 
tive,/, mays ssp. mexicana (hereafterZ. mexicana), which 
were listed in Table 1 . Among the 1 6 improved maize 
inbreds, CML41 1 and P1 from International Maize 
and Wheat Improvement Center (CIMMYT) and 
81 565 from China were chosen to represent tropical/ 
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Genotypes 



Pedigree 



Adaptation 



178 3 

1 8 Red 3 

1 8 White 3 

48-2 3 

A31 8 

B73 3 

Dan598 a 

ES40 3 

Han21 3 

Huangzao4 a 

Lu9801 

Mo1 7 3 

RP1 25 3 

Ye478 3 

Zheng22 3 

81 565 a 

CML41 1 a 

CML206 

CML85 

P1 3 

Z. mexicana 3 

Z. parviglumis 

Z. huehuetenangensis 

Z. nicaraguensis 

Z. luxurians 

Z. perennis 

Z. diploperennis 



Selected from an introduced hybrid 
American hybrid P78599 
American hybrid P78599 
Synthesized population 
Improved S3 7 
BSSS 

(Dan340 x Danhuangl 1) x (Danhuang02 x Dan599) 

Landrace Linshuidadudu selected from Sichuan 

American hybrid P78599 

Improved from Landrace, TangSiPingTou 

Ye502 x H21 

C1 03 x 1 87-2 

Derived from hybrid Chuandan9 
U81 1 2 x Shen5003 
(Duqing x E28) x Lujiu Kuan 
(Huobai xJin03)S2 x Heibai94 
P28C7-S4-#-BBBBBBBBBBB 

[EV7992#/EVP044SRBC3]#BF37SR-2-3SR-2-4-3-BB 

P34C5F21-2-#1-2-2-# 

Unknown 

Zea mays ssp. mexicana 

Zea mays ssp. parviglumis 

Zea mays ssp. huehuetenangensis 

Zea nicaraguensis 

Zea luxurians 

Zea perennis 

Zea diploperennis 



Temperate 

Temperate 

Temperate 

Temperate 

Tropical 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Temperate 

Tropical/subtropical 

Tropical/subtropical 

Tropical/subtropical 

Tropical/subtropical 

Tropical/subtropical 

Tropical 

Tropical 

Tropical 

Tropical 

Tropical 

Tropical 

Tropical 



All the materials were used for experimental validation. 

a MateriaIs were only used for SSR identification and markers development. The chromosome number is In 
and In = 20 for other species. 
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subtropical germplasm, ES40 was derived from trad- 
itional Chinese landrace, and the remaining 1 2 temper- 
ate maize inbreds were chosen to represent different 
heterotic groups in Chinese temperate maize. 
Temperate maize lines 1 78, Huangzao4, Ye478, 
Zheng22, and B73 representing PB, SPT, PA, LRC, and 
BSSS heterotic groups, respectively, were widely used 
for commercial hybrid production. For marker valid- 
ation, additional six teosinte species obtained from 
United States Department of Agriculture (USDA) and 
four maize inbreds were also included (Table 1 ). 

2.2. Maize genome sequences 

For maize lines 81 565, 1 8Red, 1 8 White, 48-2, 
Dan598, ES40, RP125, and Z. mexicana, sequences 
were generated, and paired-end libraries were con- 
structed according to the lllumina manufacturer's 
instructions. An average resequencing depth was 1 3 x 



and genome coverage was 85% for maize inbreds. One 
teosinte species Mexican was sequenced with an average 
of resequencing depth of 9x and genome coverage of 
74%. The genome sequences for the remaining maize 
lines were downloaded from the NCBI Sequence Read 
Archive (SRA) database (SRA049859 and SRA051 245) 
and NCBI GenBank (JQ886798-JQ887980). All se- 
quence reads were aligned against the maize B73 
reference genome (www.maizesequence.org Release 
4a.53) using Short Oligonucleotide Alignment Program 
2 (http://soap.genomics.org.cn/). Sequencing and reads 
mapping were carried out at Beijing Genomics Institute 
(Shenzhen, China). 1 6-1 8 

2.3. SSR identification and primer design 

SSR motifs were identified in 1 7 genomes using MISA 
(MIcroSAtellite identification tool) program down- 
loaded from the Leibniz Institute of Plant Genetics 
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and Crop Plant Research website (http://pgrc.ipk- 
gatersleben.de/misa/). Only perfect SSRs including 
mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide 
motifs with numbers of uninterrupted repeat units 
more than 1 0, 7, 6, 5, 4, and 4, respectively, were tar- 
geted. The 5'- and 3'-untranslated regions (UTR), 
protein coding sequence (CDS), intron, and intergenic 
regions were determined based on theiroriginal annota- 
tions of the maize B73 reference genome (www. 
maizesequence.org Release4a.53). Promotersequences 
were determined at 2 kb upstream of the transcription 
initiation site. 

Any SSR locus to be used to develop genetic markers 
should include a perfect repeat motif and two unique 
flanking sequences with 300 bp on each sides of the 
repeat. In our study, SSR candidate sequences were 
used for BLASTN search against the genome sequences 
(e-value cut-off of 1e~ 10 ), and filtered with >90% of 
identity and minimum alignment length with >85% 
of the flanking sequences. Those with unique hit, to- 
gether with their specific flanking sequences, were 
identified as candidate SSR loci. Then, we wrote a Perl 
script to combine SSRs within 5 kb of different 
genomes with the same motif and to identify poly- 
morphic SSR loci among 1 7 genotypes depending on 
the presence of motifs. 

The forward and reverse primers were designed 
based on unique flanking sequences using Primer 3 
(http://primer3.sourceforge.net/). Input parameters 
for the primer design were as follows: minimum, 
maximum, and optimal sizes were 1 8, 2 7, and 2 0 nt; 
minimum and maximum GC were 20 and 80%; and 
minimum, maximum, and optimal T m were 57, 63, 
and 60°C, respectively. The deviation of amplicon size 
of each SSR primer ranged from 30 to 500 bp based 
on the expected SSR sequence length. 

In addition, electronic polymerase chain reaction 
(e-PCR) programme (http://www.ncbi.nlm.nih.gov/ 
projects/e-pcr/) was applied to check the uniqueness 
and specificity of designed primers in the genomes. 
The parameters were set as following: the word size 
was 9, the discontiguous word was 1, the maximal 
allowed deviation of hit product size was 1 00, the 
maximum mismatchesallowed,andthe maximum indels 
allowed were 1 , respectively. On the other hand, the pub- 
lished SSR markers reposited in MaizeGDB (http://www. 
maizegdb.org/) were downloaded and amplified in 
silicon through e-PCR programme for furthercomparison. 

2.4. Experimental validation of polymorphic SSR 
markers 

To assess the value of identified SSR markers, 1 51 
primer pairs from 10 chromosomes including all 
types of SSR were chosen for experimental validation. 
The samples used in this experiment included 20 



improved maize lines and 7 teosinte lines (Table 1). 
Genomic DNA was extracted from seedlings using the 
CTAB method. Primers were made by Shanghai DNA 
Biotechnologies Co., Ltd. PCR was performed in 25 jxl 
reactions containing 2.5 jjlI buffer, 2.5 |xl MgCl 2 
(2 5 mM), 4.0 julI dNTP (2.5 mM), 0.2 |xl Taq polymer- 
ase (5 U/jjlI), 1 |xl template DNA (1 00 ng/jjil),1 3.8 |xl 
ddH 2 0, and 0.1 |u.g primers. The PCR conditions were 
as follows: 1 cycle at 94°Cfor 5 min; 35 cycles at 94°C 
for 30 s, 60°C for 30 s, 72°Cfor 1 min, and 1 cycle at 
72°C for 10 min. PCR products mixed with loading 
buffer were heated at 95°C for 5 min and quickly 
chilled on ice. The entire mixture was electrophoresed 
on 6% denaturing polyacrylamide gel, and the geno- 
type was scored after silver staining. The number of 
alleles was recorded and the polymorphism informa- 
tion content (PIC) was calculated as described by 
Smith etal. ]9 



3. Results 

3.1 . The abundance of SSRs in the maize genome 

A large number of perfect SSRs with mono-, di-, tri-, 
tetra-, penta-, and hexa-nucleotide motifs were identi- 
fied, but the numbers varied among different genomes 
(Table 2). The average numberof SSRs was 1 35 693 in 
1 7 genotypes, ranging from 1 33 346 loci observed in 
mexicana to 1 36 723 loci in tropical/subtropical 
maize inbred 81 565. Some reads from Z. mexicana 
could not be mapped onto the reference genome, 
which resulted in relatively lower genome coverage 
and thus, less SSRs identified compared with other 
maize inbreds. A total of 264 658 unique SSR loci 
were detected in 1 7 genomes, of which mono-, di-, 
tri-, tetra-, penta-, and hexa-nucleotide SSRs were 
153 231,65 236, 25 91 0,6572, 8839, and 4870, re- 
spectively. The mono-nucleotide motif isthe mostabun- 
dant, accounted for 57.90%. There were 38 971 
common SSRs (1 5% of the total) observed to be the 
same across 1 7 genotypes. The SSR density was calcu- 
lated based on the maize reference genome size of 
2.1 Gb, and there was a little difference among 1 7 geno- 
types foreach nucleotide motif, with an average interval 
of 1 5.48 kb between SSR loci for every genome. 
However, the average intervals for mono-, di-, tri-, tetra-, 
penta-, and hexa-nucleotide SSRs were remarkably 
different, which were 26.93, 60.88, 1 50.95, 71 7.70, 
505.05, and 942.55 kb, respectively (Table 2). SSRs 
were considerably abundant and distributed through- 
out the maize genome, with a small average marker 
interval (7.93 kb) for all detected loci. 

We also examined different SSR repeat types in the 
genome for all tested genotypes. The frequencies of 
different nucleotide repeat types in each motif were 
different, but they showed similar frequency patterns 



Table 2. Numbers and density of SSR loci identified in 1 7 maize genomes 



Genotypes SSR numbers SSR interval (kb) 





MNR 


DNR 


TNR 


TTR 


PNR 


HNR 


Total 


MNR 


DNR 


TNR 


TTR 


PNR 


HNR 


Total 


1 78 


78 367 


34 604 


1 3 929 


2 964 


41 72 


2226 


1 36 262 


26.80 


60.69 


1 50.76 


708.50 


503.36 


943.40 


1 5.41 


81 565 


79 042 


34457 


1 3 893 


2929 


41 78 


2224 


1 36 723 


26.57 


60.95 


151.16 


71 6.97 


502.63 


944.24 


1 5.36 


1 8 White 


77 144 


34451 


1 3 931 


2928 


41 66 


221 7 


1 34 837 


27.22 


60.96 


1 50.74 


71 7.21 


504.08 


947.23 


1 5.57 


1 8Red 


77 049 


34 495 


13 919 


2933 


41 88 


2221 


1 34 805 


27.26 


60.88 


1 50.87 


71 5.99 


501.43 


945.52 


1 5.58 


48-2 


76 920 


34 448 


1 3 884 


2924 


41 23 


221 9 


1 34 51 8 


27.30 


60.96 


1 51.25 


71 8.1 9 


509.34 


946.37 


1 5.61 


B73 


77 888 


34 755 


1 4 028 


2948 


41 81 


2239 


1 36 039 


26.96 


60.42 


149.70 


712.35 


502.27 


937.92 


1 5.44 


CML41 1 


78 591 


34 546 


1 3 900 


2935 


41 89 


2225 


1 36 386 


26 


72 


60.79 


1 51.08 


71 5.50 


501.31 


943.82 


1 5.40 


Dan598 


78 558 


34 559 


1 3 954 


2904 


41 59 


2227 


1 36 361 


26 


73 


60.77 


1 50.49 


723.14 


504.93 


942.97 


1 5.40 


ES40 


78 978 


34 372 


1 3 894 


2924 


4161 


2221 


1 36 550 


26 


59 


61.10 


1 51.14 


71 8.1 9 


504.69 


945.52 


1 5.38 


Han21 


78 539 


34 584 


1 3 951 


2898 


41 57 


2233 


1 36 362 


26 


74 


60.72 


1 50.53 


724.64 


505.1 7 


940.44 


1 5.40 


Huangzao4 


76 773 


34 445 


1 3 878 


2907 


41 65 


2224 


1 34 392 


27. 


35 


60.97 


1 51.32 


722.39 


504.20 


944.24 


1 5.63 


Mo1 7 


78 360 


34 442 


13 910 


2902 


41 08 


2220 


1 35 942 


26 


80 


60.97 


1 50.97 


723.64 


51 1.20 


945.95 


1 5.45 


P1 


78 975 


34 447 


1 3 849 


2961 


4146 


2235 


1 36 61 3 


26 


59 


60.96 


1 51.64 


709.22 


506.51 


939.60 


1 5.37 


RP1 25 


76 880 


34 523 


1 3 934 


2929 


41 98 


2244 


1 34 708 


27. 


32 


60.83 


1 50.71 


71 6.97 


500.24 


935.83 


1 5.59 


Z. mexicana 


75 997 


34 339 


1 3 806 


2886 


4085 


2233 


1 33 346 


27. 


63 


61 .1 5 


1 52.1 1 


727.65 


51 4.08 


940.44 


1 5.75 


Ye478 


78 571 


34 508 


1 3 973 


2944 


41 59 


2234 


1 36 389 


26 


73 


60.86 


1 50.29 


71 3.32 


504.93 


940.02 


1 5.40 


Zheng22 


78 921 


34455 


1 3 870 


2920 


41 56 


2230 


1 36 552 


26 


61 


60.95 


1 51.41 


71 9.1 8 


505.29 


941.70 


1 5.38 


Average 


77 974 


34496 


13 912 


2926 


41 58 


2228 


135693 


26.93 


60.88 


1 50.95 


71 7.70 


505.05 


942.55 


1 5.48 


Total 


1 53 231 


65 236 


25 91 0 


6572 


8839 


4870 


264 658 


1 3.70 


32.1 9 


81.05 


319.54 


237.58 


431.21 


7.93 


Common 


22 453 


9963 


4603 


553 


929 


470 


38 971 


93.53 


21 0.78 


456.22 


3797.47 


2260.50 


4468.09 


53.89 



MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. 
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Motifs Repeats number Total Average repeat Average repeat 

"<5 W 8-1 0 1 1-1 5 1 6-20 21 -25 26-30 31 -40 >40 number length (bp) 



G/C 


0 


0 


1 3 085 


24 297 


3 832 


866 


243 


45 


1 


42 369 


1 2.31 


1 2.31 


A/T 


0 


0 


22 41 0 


1 1 687 


1114 


1 92 


57 


32 


27 


35 519 


1 1.02 


1 1.02 


AT 


0 


1 875 


3268 


1 379 


708 


478 


378 


466 


1 1 1 


8663 


1 3.28 


26.56 


CT/AG 


0 


361 5 


3504 


740 


265 


1 27 


72 


95 


47 


8465 


9.42 


1 8.84 


TA 


0 


1 725 


241 7 


1 087 


625 


465 


364 


492 


1 34 


7309 


14.03 


28.05 


GA/TC 


0 


3220 


2938 


519 


237 


1 03 


71 


57 


28 


71 73 


9.19 


1 8.39 


CA/TG 


0 


666 


542 


72 


1 5 


5 


1 


5 


1 


1 307 


8.24 


1 6.48 


GT/AC 


0 


582 


588 


73 


1 2 


5 


4 


4 


0 


1 268 


8.32 


1 6.64 


GC 


0 


223 


66 


3 


3 


0 


0 


0 


0 


295 


7.42 


14.83 


CG 


0 


219 


55 


0 


1 


0 


0 


0 


0 


275 


7.29 


14.59 


CAG/CTG 


0 


1 980 


242 


3 


0 


0 


0 


0 


0 


2225 


6.5 


1 9.51 


GAT/ATC 


0 


526 


264 


46 


5 


1 


1 


0 


0 


843 


7.49 


22.48 


GCA/TGC 


0 


71 1 


99 


0 


0 


0 


0 


0 


0 


810 


6.42 


1 9.25 


GAC/GTC 


0 


587 


1 84 


28 


1 


0 


0 


0 


0 


800 


7.02 


21.06 


ATT/AAT 


0 


352 


1 24 


65 


42 


28 


1 8 


1 0 


4 


643 


1 0.25 


30.74 


TTA/TAA 


0 


359 


96 


66 


32 


27 


24 


1 4 


2 


620 


1 0.31 


30.94 


CGT/ACG 


0 


429 


131 


34 


0 


0 


0 


0 


0 


594 


7.1 8 


21.55 


TGA/TCA 


0 


350 


167 


58 


9 


3 


1 


0 


0 


588 


7.86 


23.59 


CGA/TCG 


0 


474 


100 


4 


0 


0 


0 


0 


0 


578 


6.69 


20.06 


CGC/GCG 


0 


505 


49 


4 


1 


0 


0 


0 


0 


559 


6.5 


1 9.51 


GCC/GGC 


0 


490 


42 


5 


0 


0 


0 


0 


0 


537 


6.49 


1 9.46 


TAT/ATA 


0 


246 


1 03 


43 


47 


35 


1 3 


1 3 


5 


505 


1 1.39 


34.16 


CGG/CCG 


0 


41 6 


48 


3 


0 


0 


0 


0 


0 


467 


6.5 


1 9.51 


TAC/GTA 


0 


232 


50 


26 


1 6 


6 


1 


5 


1 


337 


8.54 


25.63 


TTG/CAA 


0 


230 


80 


1 6 


4 


0 


0 


0 


0 


330 


7.32 


21.96 


TTC/GAA 


0 


288 


26 


9 


0 


0 


1 


2 


3 


329 


7.42 


22.26 


ATG/CAT 


0 


260 


45 


1 1 


4 


0 


1 


0 


0 


321 


7.01 


21.04 


GCT/AGC 


0 


291 


1 6 


1 


1 


0 


0 


0 


0 


309 


6.36 


19.09 


TGG/CCA 


0 


280 


24 


2 


0 


0 


0 


0 


0 


306 


6.4 


1 9.2 


TAG/CTA 


0 


1 86 


49 


26 


1 1 


1 0 


8 


3 


3 


296 


9.63 


28.9 


CTC/GAG 


0 


232 


46 


6 


0 


0 


0 


0 


0 


284 


6.82 


20.45 


CTT/AAG 


0 


245 


31 


1 


1 


1 


0 


1 


1 


281 


7 


21 


TCC/GGA 


0 


1 93 


43 


8 


1 


0 


0 


0 


0 


245 


6.91 


20.74 


CCT/AGG 


0 


1 87 


32 


5 


2 


1 


0 


0 


0 


227 


6.93 


20.79 


CAC/GTG 


0 


1 98 


1 8 


0 


1 


0 


0 


0 


0 


217 


6.46 


19.38 


ACC/GGT 


0 


191 


1 6 


1 


0 


0 


0 


0 


0 


208 


6.39 


1 9.1 7 


ACA/TGT 


0 


1 25 


58 


1 0 


0 


0 


1 


0 


0 


1 94 


7.34 


22.02 


AGA/TCT 


0 


1 69 


21 


2 


0 


1 


0 


0 


0 


193 


6.58 


1 9.74 


GTT/AAC 


0 


85 


29 


4 


0 


1 


1 


0 


0 


1 20 


7.45 


22.35 


AAAT/ATTT 


0 


288 


1 5 


2 


0 


0 


0 


0 


0 


305 


5.58 


22.33 


AGGC/GCCT 


0 


1 53 


1 1 


4 


0 


0 


0 


0 


0 


168 


5.9 


23.6 


TATT/AATA 


0 


195 


1 0 


2 


0 


0 


0 


0 


0 


207 


5.72 


22.88 


TCGT/ACGA 


0 


1 26 


0 


0 


0 


0 


0 


0 


0 


126 


5.07 


20.29 


TTAT/ATAA 


0 


1 1 1 


1 6 


0 


0 


0 


0 


0 


0 


127 


5.88 


23.53 


TTTA/TAAA 


0 


1 57 


7 


2 


0 


0 


0 


0 


0 


166 


5.67 


22.67 


CGAGC/GCTCG 


1 53 


24 


0 


0 


0 


0 


0 


0 


0 


1 77 


4.1 6 


24.95 



Continued 



No. 5] 

Table 3. Continued 



J.Xu etal. 



503 



Motifs 




Repeats number 










Total 


Average repeat 


Average repeat 




<5 5-7 


8-10 11-15 16-20 21- 


-25 


26-30 


31-40 


>40 




number 


length (bp) 


TTTTA/TAAAA 


82 28 


0 0 0 


0 


0 


0 


0 


1 10 


4.29 


25.75 


Al 1 1 1 /AAAAT 


69 35 


1 0 0 


0 


0 


0 


0 


1 05 


4.42 


22.1 



SSR motifs with repeats number > 1 00 in total were listed here. 



in different genomes. Here, we compared the frequen- 
cies of different SSR repeat types by taking the refer- 
ence line B73 as an example (Table 3). Of 
mononucleotide motifs, C/G repeats accounted for 
~54.4%, which was slightly higher than A/T repeats. 
Of the di-nucleotide motifs, (AT)n were most frequent 
(24.93%), followed by (AG/CI> (24.36%), (TA)« 
(21.03%), and (GA/TC) (20.64%), while the (CG)n 
motif was least frequent (0.80%). Of thetri-nucleotide 
motifs, (CAG/CTG)n was the most abundant 
(1 5.86%), while other nucleotide repeat types had 
lower frequencies (0.4-6%). Of the tetra-nucleotide 
SSRs, (AAAT/ATTT)n, was most frequent (10.35%), 
and the frequencies for the rest nucleotide repeat 
types were all lower than 7%. There were many types 
of penta- and hexa-nucleotide SSRs, each with low fre- 
quencies, ranging from 0.04 to 4%. The numbers of 
mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide 
motifs in different repeat unit classes are also listed 
in Table 3. The average repeat lengths were different 
among various motifs ranging from 1 1 .02 for (A/T)n 
to 58.84 for (AGT/ACT)n. 

3.2. Screening of SSR loci and development of maize 
SSR markers 

A total of 2 034 SSR markers have been recently devel- 
oped and reported on MaizeGDB website (www. 
maizegdb.org). Among the public markers, 1 556 SSRs 
have genomic positions. Through e-PCR programme 
conducted in B73 genome, 82 7 SSR markers have spe- 
cific amplicon, 60 SSR markers have more than one 
binding sites, and the remaining markers have no 
proper binding sites on the 10 chromosomes. Here, 
we developed a new database containing more SSR 
markers with unique flanking sequences. From the 
SSRs that could be detected (264 658) across 17 
maize genomes, 1 89 087 (71 .45%) of them were iden- 
tified with unique flanking sequences with an average 
of 82 741.9 SSR loci for each genome (Table 4). The 
average numbers of SSR loci with different motifs for 
each genome were notably different, accounting for 
55.19, 74.60, 48.09, 81.94, 80.69, and 68.94% of 
the total SSRs for mono-, di-, tri-, tetra-, penta-, and 
hexa-nucleotide motifs, respectively (Table 4). It 
implies that over 80% of tetra- and penta- nucleotide 



motifs in the maize genome can be used to design SSR 
markers. A total of 2 5 43 7 SSRs with unique flanking 
sequences were found to be shared across 1 7 tested 
genomes, of which 9240 (36.33%) were polymorphic. 

Of 1 89 087 candidate SSRs, 1 88 571 loci have spe- 
cific physical position and would be developed as 
genetic markers in the study. Primer pairs were then 
designed for the 1 88 571 SSR loci, with 13 344 
(chromosome 10) to 29 779 (chromosome 1) SSRs 
on each chromosome, and 1 73 58 7 of them were poly- 
morphic with length differences and present-absent 
variation in 1 7 genomes. E-PCR programme was 
further conducted to validate and refine the specificity 
of new designed SSR markers, and 1 1 1 887 primer 
pairs of them could bind as expected and the others 
were amplified with multiple binding sites or false 
match. Through comparing SSR sequences among 1 7 
tested genomes, a new database was developed to 
include 1 1 1 887 SSR markers with specific physical 
positions, with proportion of 59% of the candidate 
SSR loci with specific flanking sequences (Table 5 and 
Supplementary Table S1 ). Among these markers, SSRs 
with mono-, di-, tri-, tetra-, penta-, and hexa- nucleo- 
tide motifs accounted for 58.00, 26.09, 7.20, 3.00, 
3.93, and 1.78%, respectively. A total of 35 573 SSR 
loci, accounting for 31.8% of the refined SSR markers, 
showed length polymorphism in the 1 7 tested geno- 
types. The PIC for these polymorphic SSRs varied from 
0.05 to 0.83, with an average of 0.3 1 (Supplementary 
Table S1). SSR markers with mono- and di-nucleotide 
motifs showed higher levels of polymorphism (33.87 
and 37.31%, respectively) than other SSR markers with 
tetra-, penta-, and hexa-nucleotide motifs (7.44- 
1 7.1 9%). Comparing with the SSR markers in MaizeGDB 
database, there were 1 8 606 SSR markers, accounting 
for 1 6.6% of the newly developed SSR markers, shared 
the same loci with the public SSR markers with various 
motifs. However, only 527 (0.47%) newly developed SSR 
markers had completely compatible position with public 
SSR primers. Additionally, the average SSR lengths and 
numberof loci across 1 0 chromosomes were calculated 
for three SSR datasets, all SSRs, SSRs with unique flank- 
ing sequences, and polymorphic SSRs (Fig. 1 ). In each 
of the three SSR datasets, the numbers of loci gradually 
declined with the increase of SSR lengths, the same as 
shown in previous studies. 20 
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Table 4. Summary of SSR loci with unique flanking sequences identified in tested maize genomes 



Motifs 


Ave raj 


>e SSRs 


Total SSRs 




Common SSRs 


Common SSRs with 
polymorphism 


No. 




No. 


% b 


No. 


% c 


No. 


% 


MNR 




1Q 
JJ.I ? 


I U J H" O O 


D / . Jt 


1 2 029 


54.31 


4588 


38.1 4 


DNR 


25 733.2 


74.6 


52 876 


81.05 


8630 


79.87 


3776 


43.75 


TNR 


6689.9 


48.09 


1 5 946 


61.54 


2755 


59.53 


560 


20.33 


TTR 


2397.2 


81.94 


5658 


86.09 


652 


83.27 


142 


21.78 


PNR 


3355.4 


80.69 


7481 


84.64 


950 


83.1 1 


1 29 


1 3.58 


HNR 


1 535.8 


68.94 


3640 


74.74 


422 


74.1 7 


45 


1 0.66 


Total 


82 741.9 


60.98 


1 89 087 


71.45 


25437 


63.47 


9240 


36.33 



MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. 
Percentage of the average number of SSRs with unique flanking sequences against all for every tested maize genome. 
Percentage of total SSR number with unique flanking sequences against all identified in 1 7 maize lines. 
Percentage of the common loci against all that are the same in 1 7 maize lines. 



Table 5. Numbers of candidate SSR markers, and polymorphic SSR markers detected in 1 7 maize lines and previously developed SSR in 
MaizeGDB database 



Chr 










Candidate SSR markers 












Poly. 

(%) a 


SSRs in 
MaizeGDB 
database 


MNR 


DNR 


TNR 


TTR 




PNR 




HNR 


Mono- 


Poly- 


Mono- 


Poly- 


Mono- 


Poly- 


Mono- 


Poly- 


Mono- 


Poly- 


Mono- 


Poly- 


1 


6957 


3422 


2958 


1 737 


1 1 29 


242 


439 


84 


643 


73 


304 


1 2 


30.94 


293 


2 


4875 


2488 


2118 


1212 


697 


1 57 


294 


76 


41 8 


46 


203 


1 7 


31.71 


226 


3 


5079 


2414 


2289 


1217 


777 


1 77 


330 


69 


514 


49 


209 


1 7 


30.01 


224 


4 


4831 


2331 


2246 


1264 


774 


1 66 


331 


73 


422 


45 


1 92 


23 


30.73 


141 


5 


4623 


2498 


1 851 


1 1 40 


681 


1 64 


276 


67 


461 


55 


1 98 


1 8 


32.76 


1 46 


6 


3501 


1 81 6 


1 499 


844 


579 


1 1 8 


248 


44 


345 


25 


1 58 


1 2 


31.1 1 


1 1 1 


7 


3548 


1 870 


1 473 


929 


505 


1 50 


271 


41 


285 


44 


1 50 


1 2 


32.83 


1 1 2 


8 


3553 


1 845 


1 391 


901 


504 


1 1 8 


1 85 


32 


31 5 


50 


1 80 


1 3 


32.56 


1 1 8 


9 


3059 


1 726 


1 262 


847 


451 


100 


203 


39 


297 


40 


1 37 


1 6 


33.85 


1 02 


1 0 


2884 


1 572 


1214 


803 


436 


1 27 


203 


52 


248 


26 


1 1 1 


8 


33.68 


83 


Total 


42 91 0 


21 982 


1 8 301 


1 0 894 


6533 


1519 


2780 


577 


3948 


453 


1 842 


148 


31.79 


1 556 



MNR, DNR, TNR, TTR, PNR, and HNR indicate mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSRs. 
Mono: monomorphism; poly: polymorphism; Chr: chromosome. 

Percent of polymorphic SSR markers overall of candidate SSR markers in silicon analysis. 



3.3. Distribution of SSRs in different genomic regions 

A total of 264 658 SSRs were detected in 17 
genomes, and 263 42 3 loci of them have specific phys- 
ical position. The distributions of 2 63 423 SSR loci and 
1 1 1 887 newly developed SSR markers refined by e- 
PCR programme across 1 7 tested genomes were 
shown in Fig. 2 a and b, respectively. SSRs were unevenly 
distributed on chromosome regions, and there were 
much more loci located in near telomeric regions 
than near centromeres, which was accordance with 
the distribution patterns of genes in maize. 21 
Moreover, we compared SSR distributions across five 
genomic regions using tested genomes of P1, B73, 



and Z. mexicana to represent tropical, temperate, and 
wild maize germplasm, respectively (Table 6). SSR loci 
were most abundant in intergenic region and least fre- 
quent in UTR region. Polymorphism rate and GC 
content of SSRs in coding regions were higher than 
other genie regions. 

The average intervals between SSRs were the longest 
in intergenic regions, second inCDS regions,and smal- 
lest in promotors (Table 6). Distributions for the SSRs 
with unique flanking sequences and for the poly- 
morphic SSRs across tested genomes also varied 
amongthesix genomic regions, butthetrend was con- 
sistent with that for all the candidate SSR loci. This 
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result is also in full agreement with a previous report in 
rice. 22 In addition, SSR distribution was rather similar 
among the three representative genotypes. 

Furthermore, the repeat types of SSRs in CDS region 
of B73 were investigated. SSRs with tri-nucleotide 
repeats were the most (1 832) among the six repeat 
types, with proportion of 71.8% in CDS region. The 
tri- and hexa-nucleotide SSRs that would not bring 
the frame shift accounted for 84.1% (2148) of the 
SSRs in CDS region. Therefore, only 1 5.9% of the SSRs 
in CDS region have potential threats to the gene 
structure. 



Chr.l Chr.2 Chr.3 Chr.4 Chr.5 Chr.6 Chr.7 Chr.8 Chr.9 Chr.10 
a b ab ab ab ab ab abab abab 




Figure 1. Distributions of 263,423 SSR loci (a) and 1 1 1,887 new 
developed SSR markers (b) with unique physical positions across 1 0 
chromosomes in the B73 reference genome (www.maizesequence. 
org Release 4a. 5 3). Different colors represent levels of density of SSRs. 



3.4. SSR markers validated for quality 
and polymorphism 

Atotalof 1 51 SSR markers were randomly chosenfor 
experimental validation using 20 maize inbreds and 7 
teosinte lines (Fig. 3 and Table 1). Of them, 121 
primer pairs (80.1%) generated specific products and 
distinct bands, while 30 primer pairs failed to produce 
stable orclear bands due to the lackof sequence speci- 
ficity in the genomic DNA samples. The majority of the 
121 primer pairs (112 primer pairs) revealed high 
levels of allelic diversity in tested 27 lines, with PIC 
values of 0.074-0.796 (an average of 0.478). The 
1 1 2 polymorphic SSR loci contained 329 alleles in 
total and an average of 2.94 alleles with a range 
of 2-5 (Supplementary Table S2). 

In addition, we made a detailed comparison of the 
allele number and PIC value in silicon analysis and in 
maker validation in 1 7 tested genotypes. Forty-one 
of the 1 21 primers possessed the practical alleles in 
accordance with the expected alleles, 38 primer pairs 
had more allele number, and 42 primer pairs had 
less allele number in silicon analysis than in maker 
validation (Supplementary Table S2). Additionally, 
comparing polymorphism in silicon analysis using 1 7 
tested genomes and in maker validation using 2 7 gen- 
otypes, we found that 51 primer pairs showed more 
alleles and higher PIC values in validation experiment 
(Supplementary Table S2). Interestingly, 26 of 151 
SSR markers with no polymorphism in silicon analysis 
showed more than one alleles in validation experi- 
ment. We also found that the length of PCR products 
in silicon analysis almost consist with those in marker 
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Figure 2. Correlation between SSR numbers and SSR lengths. 
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All SSR loci 



SSR loci with unique flanking 
sequences 



SSR loci with polymorphism 



Count 



Interval Length 
(kb) (bp) 



GC% 



Count 



Interval 
(kb) 



Length 
(bp) 



GC% 



Count 



Interval Length 
(kb) (bp) 



GC% 



B73 



5'-UTR 


3804 


6 


91 


1 6 


1 0 


32.20 


3601 


7. 


30 


1 6. 


1 9 


32 


60 


1 335 


19 


,69 


1 6 


23 


26 


.55 


3'-UTR 


3829 


6 


87 


1 6 


1 3 


33.68 


3628 


7. 


25 


1 6. 


22 


33 


.94 


1 341 


19 


,62 


1 5 


90 


26 


.70 


CDS 


2553 


1 7 


.27 


1 9 


80 


74.82 


2391 


23 


1 0 


1 9. 


87 


75 


28 


309 


1 78 


.73 


21 


28 


56 


.84 


Intron 


1 1 857 


10 


1 3 


1 6 


03 


29.94 


1 1 038 


1 0 


88 


1 6. 


1 2 


29 


.97 


4704 


25 


.54 


1 6 


70 


29 


.60 


Pro motors 


1 0 370 


6 


,26 


1 7 


35 


27.57 


9248 


7. 


02 


1 7. 


54 


26 


.71 


3344 


19 


.42 


1 7 


26 


20 


.82 


Intergenic 


1 07 658 


21 


.63 


1 6 


1 7 


50.49 


65 098 


28 


56 


1 7. 


1 2 


46 


.22 


21 650 


85 


88 


1 7 


70 


42 


.24 


Total/ 


136039 


1 5 


1 5 


1 6 


33 


47.01 


90441 


22 


79 


1 7. 


1 1 


42 


.62 


30900 


66 


.70 


1 7 


49 


37 


.61 



average 
Z. Mexicana 



5'-UTR 


2520 


10 


.25 


1 5 


55 


26 


,23 


2263 


1 1 


.42 


1 5 


66 


24 


1 0 


1007 


25.66 


14.35 


1 7.1 8 


3'-UTR 


3329 


7 


76 


1 7 


34 


38 


34 


3092 


8 


.35 


1 7 


42 


38 


,1 0 


1 100 


23.48 


1 5.81 


23.81 


CDS 


3636 


1 4 


,29 


1 7 


70 


45 


01 


3252 


1 5 


,98 


1 7 


93 


43 


.96 


954 


54.47 


1 6.1 4 


23.08 


Intron 


81 02 


1 4 


,23 


1 6 


24 


32 


.41 


71 68 


1 6 


.08 


1 6 


48 


30 


.43 


3035 


37.98 


1 5.63 


24.71 


Pro motors 


1 0 903 


5 


86 


1 7 


79 


34 


35 


1 0 035 


6 


.37 


1 7 


95 


34 


.04 


3627 


1 7.62 


1 6.32 


21.71 


Intergenic 


1 07 785 


1 6 


.50 


1 6 


1 8 


49 


95 


57 81 3 


30 


.76 


1 7 


51 


42 


.07 


22 41 3 


79.33 


1 6.64 


38.84 


Total/ 


1 33 346 


1 5 


,46 


1 6 


38 


47 


1 3 


80 545 


25 


,59 


1 7 


50 


39 


.82 


30 787 


66.94 


1 6.45 


34.45 



average 
P1 



5'-UTR 


2684 


9.60 


1 5.27 


24 


.75 


2419 


1 0.65 


1 5 


37 


22 


.50 


926 


27 


82 


1 5.44 


20.02 


3'-UTR 


3478 


7.41 


1 7.04 


37 


.07 


3230 


7.98 


1 7 


1 1 


36 


.90 


1 044 


24 


68 


1 6.96 


29.53 


CDS 


3724 


1 3.91 


1 7.44 


44 


.49 


3351 


1 5.46 


1 7 


63 


43 


48 


91 1 


56 


86 


1 7.51 


28.36 


Intron 


8622 


1 3.32 


1 5.96 


31 


1 8 


7659 


14.99 


1 6 


1 4 


29 


,1 3 


3073 


37 


36 


1 6.28 


28.44 


Pro motors 


11218 


5.68 


1 7.56 


33 


82 


10 316 


6.1 8 


1 7 


67 


33 


.50 


3486 


1 8 


28 


1 7.56 


24.74 


Intergenic 


1 10 043 


1 6.1 7 


16.06 


49 


51 


59 1 67 


30.07 


1 7 


26 


41 


81 


22 475 


79 


1 6 


1 7.39 


41.03 


Total/ 


1 36 61 3 


1 5.09 


1 6.24 


46 


.58 


82 831 


24.88 


1 7 


24 


39 


38 


30 621 


67 


31 


1 7.28 


37.09 



average 



validation (Supplementary Tables S1 and S2). The 
results indicate that newly developed SSR markers 
are informative and useful, and 70% of the SSR 
markers in our database are valid and polymorphic. 

4. Discussion 

SSRs are co-dominant, abundant, high polymorphic, 
and dispersed throughout plant genomes. Based on 
the survey across genomes, on average one SSR was 
found every 1 .1 4 kb in Arabidopsis, 23 3.6 kb in rice, 22 
4 kb in Brassica oleracea, 24 4.5 kb in soybean, 10 
220 kb in sorghum, 25 and 578 kb in wheat. 26 In this 
study, average SSR density was one SSR every 
1 5.48 kb. These may reflect real genetic differences 
existing among plant genomes at DNA level, and also 
the differences involved in sequencing methods and 
procedures. We used maize inbred B73 asthe reference 
genome, some reads from maize wild relative, 
Z. mexicana, could not be mapped onto the reference, 



resulting in a relatively lower genome coverage and 
thus, less SSRs identified compared with other maize 
inbreds. Therefore, the number of SSR loci identified 
from Z. Mexicana may be underestimated. 

In general, a small difference in SSR distribution was 
found for different populations or ecotypes in the 
same species. For instance, a very similar SSR distribu- 
tion was found between indica and japonica rice, and 
the SSR density (interval between two SSRs) varied 
from one SSR every 2.0-8.1 kb, which was higher in 
5'-UTR (one SSR every 2.1 and 2.0 kb, respectively) 
butlowinCDSregions(oneSSRevery8.1 and 7.7 Irre- 
spectively). 22 Our study revealed a similar SSR distribu- 
tion pattern across the tested temperate, tropical, and 
wild maize lines. However, SSRs are not evenly distribu- 
ted in different genomic regions with much lower SSR 
density in CDS region than in UTR and intronic 
regions. Intriguingly, we found that majority of SSRs 
resided in CDS region were tri-nucleotide repeats, 
which was consistent with other report and implied 
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Figure 3. Experimental validation of six randomly selected SSR markers in 2 7 genotypes. Lanes 1 -2 7 were PCR products of Zea perennis, Z. 
diploperennis,Z. mays ssp. parviglumis,Z. mays ssp. huehuetenangensis,Z. mcaraguensis,Z. \uxur\ans,Z. mays ssp. mexicana, RP1 25, 1 8Red, 
1 8White, CML206, 81 565, A3 1 8, P1, Han2 1 , CML85, CML41 1, Ye478, Mo1 7,Zheng22, 1 78, 48-2, B73, L 
and Dan598, respectively. 



, Lu9801 , ES40, Huangzao4, 



the specific selection against frame shift mutations in 
coding regions. 27 Comparing with rice genome, 
the average SSR length was approximately identical 
(16-17 bp), but the average GC content in maize SSR 
sequences was much higher (47%) than rice (27%). In 
the maize genome, the proportions of mono-, di-, and 
tri-nucleotide SSR motifs were ~60, 20, and 1 0%, re- 
spectively. Tetra-, penta-, and hexa-nucleotide SSR 
motifs were less abundant, together accounting for 
1 0%, which was accordance with the report in rice. 22 
SSR densities for different motifs were also unbalanced 
and the average interval varied from 26 to 950 kb. 
Moreover, we found that C/G, AT, and CAG/CTG 
repeats were the most common for mono-, di-, and 
tri-nucleotide SSRs, respectively, in maize, while A/T, 
AG, and AGG/CCT repeats are the most common in 
rice. 22 Meanwhile, AT repeats were also the most 
common dinucleotide motifs in sorghum. 25 

Short-read data from next-generation sequencing 
technologies are now being generated across a range 
of research projects. The fidelity of this data can be 
affected by several factors, and mapping errors and 
gaps still exist to a certain extent. 28 However, the avail- 
ability of the maize genome sequence still affords us a 
simple and economical way to survey and identify 
markers, thus enabling us to develop more convenient 
molecular markers for breeding applications. Several 
sets of maize germplasm including temperate, tropical, 
and their wild relatives were resequenced using next- 
generation sequencing technology. 1 8,29-31 There are 
two major advantages in using currently available 



data for the analysis of SSR distribution and marker de- 
velopment. The maize germplasms from different eco- 
logical regions and heterotic groups (PB, SPT, PA, LRC, 
and BSSS) are highly diverse and host rare and unique 
alleles, providing an opportunity of using these types of 
genetic variation in hybrid maize breeding. On the 
other hand, whole-genome sequence data provide an 
ideal resource and the most complete picture of genetic 
variation for developing high-density genetic markers. 

SSR markers of highly polymorphic among diverse 
germplasms provide some advantages in genetics and 
breeding applications. In spite of considerable efforts 
in developing molecular markers in maize, the 
number of SSRs publicly available is still limited. From 
>260 000 SSRs identified from 17 tested genomes, 
we detected 1 1 1 887 SSR loci with unique flanking 
sequence and single binding site through genome 
sequence blast and e-PCR analysis. These SSR loci can 
be developed as polymorphic markers in silico and 
public on the MaizeGDB database, which are ~60 
times more than those deposited in the MaizeGDB 
database so far. A total of 1 556 SSR markers from the 
MaizeGDB database have specific location, and 1 6.6% 
of the newly developed SSR markers shared the same 
loci with public SSR markers. For some of the public 
SSR markers, the amplicon size was too large and it 
contained several newly developed SSR primers with 
different motifs. Therefore, only 0.47% (527) of newly 
developed SSR markers had completely compatible 
position with public SSR markers. Another reason for a 
few common SSRs shared with the two datasets 
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maybe the traditional method for SSR marker develop- 
ment was based on screening of small-insert or micro- 
satellite-enriched genomic libraries by hybridization 
in different materials, 32 which was different from our 
analyses based on B73 reference genome and other 
resequenced genomes. Furthermore, the second- 
generation sequencing was different from the Sanger 
sequencing, which also lead to the differences. The 
experimental validation also proved to detect more 
alleles than the expected in silicon analysis due to 
diverse materials used in the study, but some SSR loci 
with little length differences were also hard to distin- 
guish. The average marker density for the newly devel- 
oped dataset reached one SSR per 1 4.7 kb in the B73 
reference genome, indicating that maize is a highly 
polymorphic species. 33 The availability of abundant 
SSR markers allows dramatic improvement in the effi- 
ciency of marker-assisted selection and fine mapping 
of QTL regions. 

Previous studies have mainly focused on di-, tri-, and 
tetra-nucleotide SSRs, whereas mono-, penta-, and 
hexa-nucleotide SSRs have not drawn enough attention 
for marker development. We found that mono-nucleo- 
tide SSRs had much higher polymorphism rates than 
others, and penta- and hexa-nucleotide SSRs had rela- 
tively longer repeat units. Intron and UTR SSRs were 
more polymorphic than CDS SSRs due to low selective 
pressure in non-coding regions, which were consistent 
with previous reports. ,34 ~ 36 Experimental validation 
using 2 0 maize inbreds and 7 teosinte species showed 
that over 70% of the primer pairs could generate the 
target bands with length polymorphism, promising a 
great potential for the application of these SSR markers. 
In practice, it would be very powerful when they are 
used for genetic populations derived from various types 
of maize germplasm that were sampled for this study. 

Supplementary data: Supplementary data are 
available at www.dnaresearch.oxfordjournals.org. 
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